AITopics | triple descent

Collaborating Authors

triple descent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Triple descent and the two kinds of overfitting: where & why do they appear?

Neural Information Processing SystemsDec-23-2025, 20:22:39 GMT

A recent line of research has highlighted the existence of a ``double descent'' phenomenon in deep learning, whereby increasing the number of training examples N causes the generalization error of neural networks to peak when N is of the same order as the number of parameters P. In earlier works, a similar phenomenon was shown to exist in simpler models such as linear regression, where the peak instead occurs when N is equal to the input dimension D. Since both peaks coincide with the interpolation threshold, they are often conflated in the litterature. In this paper, we show that despite their apparent similarity, these two scenarios are inherently different. In fact, both peaks can co-exist when neural networks are applied to noisy regression tasks. The relative size of the peaks is then governed by the degree of nonlinearity of the activation function. Building on recent developments in the analysis of random feature models, we provide a theoretical ground for this sample-wise triple descent.

name change, neural network, triple descent, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Review for NeurIPS paper: Triple descent and the two kinds of overfitting: where & why do they appear?

Neural Information Processing SystemsJan-22-2025, 11:32:51 GMT

Maybe formulate such that it refers to both sample-wise and complexity-wise, e.g.

arxiv, double descent, nonlinear, (14 more...)

Neural Information Processing Systems

Genre: Research Report (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.57)

Add feedback

Review for NeurIPS paper: Triple descent and the two kinds of overfitting: where & why do they appear?

Neural Information Processing SystemsJan-22-2025, 11:32:44 GMT

The reviewers unanimously appreciated the conceptual novelty to the paper where authors separate the two potential phenomena causing non-monotonic test error behavior in terms of number of samples. This is very relevant work for the conference and as such the reviewers have provided extensive feedback. I urge the authors to take into account the detailed feedback in their revision. Additionally, below is the anonymized transcript of some interesting discussion points which I believe highlight some confusions in the paper and I strongly encourage the authors to address them. Most importantly among these please address with a mathematical proof/extensive empirical evidence the following concern raised by R1 regarding one of the main claims in the paper: The claim that the linear peak is exhibited only in the presence of noise as such is not justified in the paper (the authors cite [6] but [6] is only for linear models), I believe with non-linear RF models, there might still be variance terms from initialization and training data, in other words, it is not clear if the total variance can exhibit a linear peak even when SNR \inf (no noise).

linear model, linear peak, variance, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Triple descent and the two kinds of overfitting: where & why do they appear?

Neural Information Processing SystemsOct-9-2024, 17:42:08 GMT

A recent line of research has highlighted the existence of a double descent'' phenomenon in deep learning, whereby increasing the number of training examples N causes the generalization error of neural networks to peak when N is of the same order as the number of parameters P. In earlier works, a similar phenomenon was shown to exist in simpler models such as linear regression, where the peak instead occurs when N is equal to the input dimension D. Since both peaks coincide with the interpolation threshold, they are often conflated in the litterature. In this paper, we show that despite their apparent similarity, these two scenarios are inherently different. In fact, both peaks can co-exist when neural networks are applied to noisy regression tasks. The relative size of the peaks is then governed by the degree of nonlinearity of the activation function. Building on recent developments in the analysis of random feature models, we provide a theoretical ground for this sample-wise triple descent.

neural network, random feature model, triple descent, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Multiple Descent in the Multiple Random Feature Model

Meng, Xuran, Yao, Jianfeng, Cao, Yuan

arXiv.org Machine LearningOct-10-2023

Recent works have demonstrated a double descent phenomenon in over-parameterized learning. Although this phenomenon has been investigated by recent works, it has not been fully understood in theory. In this paper, we investigate the multiple descent phenomenon in a class of multi-component prediction models. We first consider a ''double random feature model'' (DRFM) concatenating two types of random features, and study the excess risk achieved by the DRFM in ridge regression. We calculate the precise limit of the excess risk under the high dimensional framework where the training sample size, the dimension of data, and the dimension of random features tend to infinity proportionally. Based on the calculation, we further theoretically demonstrate that the risk curves of DRFMs can exhibit triple descent. We then provide a thorough experimental study to verify our theory. At last, we extend our study to the ''multiple random feature model'' (MRFM), and show that MRFMs ensembling $K$ types of random features may exhibit $(K+1)$-fold descent. Our analysis points out that risk curves with a specific number of descent generally exist in learning multi-component prediction models.

artificial intelligence, machine learning, proposition, (16 more...)

arXiv.org Machine Learning

2208.09897

Country:

North America > United States (0.28)
Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Triple descent and the two kinds of overfitting: Where & why do they appear?

d'Ascoli, Stéphane, Sagun, Levent, Biroli, Giulio

arXiv.org Machine LearningOct-13-2020

A recent line of research has highlighted the existence of a "double descent" phenomenon in deep learning, whereby increasing the number of training examples $N$ causes the generalization error of neural networks to peak when $N$ is of the same order as the number of parameters $P$. In earlier works, a similar phenomenon was shown to exist in simpler models such as linear regression, where the peak instead occurs when $N$ is equal to the input dimension $D$. Since both peaks coincide with the interpolation threshold, they are often conflated in the litterature. In this paper, we show that despite their apparent similarity, these two scenarios are inherently different. In fact, both peaks can co-exist when neural networks are applied to noisy regression tasks. The relative size of the peaks is then governed by the degree of nonlinearity of the activation function. Building on recent developments in the analysis of random feature models, we provide a theoretical ground for this sample-wise triple descent. As shown previously, the nonlinear peak at $N\!=\!P$ is a true divergence caused by the extreme sensitivity of the output function to both the noise corrupting the labels and the initialization of the random features (or the weights in neural networks). This peak survives in the absence of noise, but can be suppressed by regularization. In contrast, the linear peak at $N\!=\!D$ is solely due to overfitting the noise in the labels, and forms earlier during training. We show that this peak is implicitly regularized by the nonlinearity, which is why it only becomes salient at high noise and is weakly affected by explicit regularization. Throughout the paper, we compare analytical results obtained in the random feature model with the outcomes of numerical experiments involving deep neural networks.

artificial intelligence, arxiv preprint arxiv, machine learning, (17 more...)

arXiv.org Machine Learning

2006.03509

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

Adlam, Ben, Pennington, Jeffrey

arXiv.org Machine LearningAug-15-2020

Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably well. An emerging paradigm for describing this unexpected behavior is in terms of a \emph{double descent} curve, in which increasing a model's capacity causes its test error to first decrease, then increase to a maximum near the interpolation threshold, and then decrease again in the overparameterized regime. Recent efforts to explain this phenomenon theoretically have focused on simple settings, such as linear regression or kernel regression with unstructured random features, which we argue are too coarse to reveal important nuances of actual neural networks. We provide a precise high-dimensional asymptotic analysis of generalization under kernel regression with the Neural Tangent Kernel, which characterizes the behavior of wide neural networks optimized with gradient descent. Our results reveal that the test error has non-monotonic behavior deep in the overparameterized regime and can even exhibit additional peaks and descents when the number of parameters scales quadratically with the dataset size.

high dimension, multi-scale theory, neural tangent kernel, (13 more...)

arXiv.org Machine Learning

2008.06786

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback